Predicting Sporadic Grid Data Transfers
نویسندگان
چکیده
The increasingly common practice of (1) replicating datasets and (2) using resources as distributed data stores in Grid environments has lead to the problem of determining which replica can be accessed most efficiently. Due to diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica location from among many requires accurate prediction information of end-to-end data transfer times between the sources and sinks. In this paper, we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, drawing from their merits of capturing whole system performance and variations in load patterns respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus ToolkitTM, and observe performance gains of up to 10% in prediction accuracy when compared to approaches based on past system behavior in isolation.
منابع مشابه
Predicting Scientific Grid Data Transfer Characteristics
Big data scientists routinely transfer massive amounts of data. By understanding and modelling different aspects of these data transfers, we can make using big data more efficient and user-friendly. In this paper, we first develop a set of data storage location prediction heuristics. These heuristics help big data scientists manage and discover locations to transfer their data from and to. We s...
متن کاملEvaluating and Enhancing the Use of the GridFTP Protocol for Efficient Data Transfer on the Grid
Grid applications often require large data transfers along heterogeneous networks having different latencies and bandwidths, therefore efficient support for data transfer is a key issue in Grid computing. The paper presents a performance evaluation of the GridFTP protocol along some typical network scenarios, giving indications and rules of thumb useful to select the “best” GridFTP parameters. ...
متن کاملGridTorrent: Optimizing data transfers in the Grid with collaborative sharing
As Grid systems expand and become more and more popular, there is a growing need for efficient, scalable and robust data transfer mechanisms that can deal effectively with large file transfers and flash crowd situations. In this paper, we address the problem of data transfer optimization by presenting GridTorrent a modified BitTorrent protocol, tightly coupled with modern Grid middleware compon...
متن کاملThe CMS PhEDEx System: a Novel Approach to Robust Grid Data Distribution
The CMS experiment has taken a novel approach to Grid data distribution. Instead of having a central processing component making global decisions on replica allocation, CMS has a data management layer composed of a series of collaborating agents; the agents are persistent, stateless processes which manage specific parts of replication operations at each site in the distribution network. The age...
متن کاملNetwork-Aware HEFT Scheduling for Grid
We present a network-aware HEFT. The original HEFT does not take care of parallel network flows while designing its schedule for a computational environment where computing nodes are physically at distant locations. In the proposed mechanism, such data transfers are stretched to their realistic completion time. A HEFT schedule with stretched data transfers exhibits the realistic makespan of the...
متن کامل